iTRAQ proteomics of sentinel lymph nodes for identification of extracellular matrix proteins to flag metastasis in early breast cancer

Patients with early breast cancer are affected by metastasis to axillary lymph nodes. Metastasis to these nodes is crucial for staging and quality of surgery. Sentinel Lymph Node Biopsy that is currently used to assess lymph node metastasis is not effective. This necessitates identification of biomarkers that can flag metastasis. Early stage breast cancer patients were recruited. Surgical resection of breast was followed by identification of sentinel lymph nodes. Fresh frozen section biopsy was used to assign metastatic and non-metastatic sentinel lymph nodes. Discovery phase included iTRAQ proteomics coupled with mass spectrometric analysis to identify differentially expressed proteins. Data is available via ProteomeXchange with identifier PXD027668. Validation was done by bioinformatic analysis and ELISA. There were 2398 unique protein groups and 109 differentially expressed proteins comparing metastatic and non-metastatic lymph nodes. Forty nine proteins were up-regulated, and sixty proteins that were down regulated in metastatic group. Bioinformatic analysis showed ECM-receptor interaction pathways to be implicated in lymph node metastasis. ELISA confirmed up-regulation of ECM proteins in metastatic lymph nodes. ECM proteins have requisite parameters to be developed as a diagnostic tool to assess status of sentinel lymph nodes to guide surgical intervention in early breast cancer.

. The procedures were followed as per the ethical standards formulated in the Helinski declaration. Early breast Cancer patients were screened and admitted at the Department of Surgery. Detailed information on the study was explained to the recruited patients and informed consent was obtained from them. Bedside examination included clinical history, symptoms, signs and general examination. Patients suspected of having early breast cancer were subjected to mammography imaging. As explained later in this section, sentinel lymph node biopsy tissues from these patients were sent to Department of Pathology for histopathology, which was the gold standard for assigning the clinical phenotypes of sentinel lymph node metastasis (SLNM +) and sentinel lymph node without metastasis (SLNM-). For the discovery phase of the proteomic experiment, we took 5 patients with SLNB + , 5 patients with SLNB-and, two benign breast tumor tissues as cancer controls. For the validation phase of experiment, we took 13 patients each with SLNM + and SLNM-.

Patient inclusion and exclusion criteria.
Staging of the breast cancer were determined as per to the American Joint Committee on Cancer (AJCC) cancer staging criteria. Inclusion and exclusion criteria were used just as in our previous study 29 . Women with early invasive intra-ductal breast cancer as per the WHO classification of the tumor, and who had not undergone any therapeutic intervention were recruited into the study. Patients with advanced breast cancer, and patients with early breast cancer who had either received chemotherapy or radiotherapy were excluded from the study.
Identification and excision of sentinel lymph node. Technetium tagged sulphur colloid was injected intra-dermally into the lower inner quadrant of the affected breast, two hours prior to the surgery 29 . In operation theatre, 1 ml of 1% fluorescent methylene blue dye diluted in 4 ml saline was injected at multiple sites intradermally around areolar region and in sub-areolar region. After five minutes of gentle massage, an incision was made on axillary skin crease at the site of maximum radioactivity. By using blunt and sharp dissection, methylene fluorescent lymphatics was identified using a blue light lamp, and blue lymphatics was identified by direct visualization 30 . Lymph node having highest count was considered as sentinel node and was excised.
Sentinel lymph node tissue sample collection. After the excision of sentinel lymph node, adherent fat tissues were neatly removed, and blood stains were washed thoroughly with 1X PBS (pH 7.4) 29 . The nodes were longitudinally sectioned to obtain 5 mm thick slices. One set of alternate slices was sent to the Department of Pathology for histopathological assessment. The other set of slices were taken to proteomics facility and stored at − 80 °C for iTRAQ based proteomic experiments.
Histopathology. Histopathological procedures that were followed were those standardized in our previous study 29 . Lymph node slices were fixed into formalin fixed paraffin embedded blocks. These were further sectioned into 4 μm poly-L-lysine-coated slides. These paraffin sections were deparaffinised with three subsequent washes in xylene and then rehydrated by washing them stepwise in 100% ethanol, 90% ethanol, 70% ethanol and distilled water. The sections were stained with hematoxylin and washed in running water for 5 min. The slides were then stained in eosin solution for two minutes and then rinsed with 95% ethanol. The slides were then subjected to 100% ethanol for two minutes, twice. After final exposure to xylene, a drop of Distyrene Plasticizer Xylene (DPX) was used to mount the tissue on each slide and covered with a glass cover-slip. The slides Mammography. All recruited patients in the study underwent full-field digital mammography in craniocaudal projection and medio-lateral oblique projection. The effective dose of a four view mammogram ranged from 4 to 6 mega gray. The evaluation of mammogram was done according to the Breast Imaging Reporting and Data System classification (BIRADS).
Sample phenotyping and protein isolation. Phenotyping and protein isolation was done using protocols standardized in our previous study 29 . The sentinel lymph node tissue sections which were stored at − 80 °C were annotated as either SLNM + or SLNM-based on histopathology finding. The tissue samples were minced and the proteins were solubilized in 120 μl of lysis buffer that contained 8 M urea, 2 M thiourea and 4% 3-[(3-Cholamidopropyl)dimethylammonio]1-propanesulfonate (CHAPS). The tissue was homogenized by sonication at an interval of 3 s and vortexed for 2 min. The samples were then centrifuged at 15,000 rpm for 20 min at 4 °C, debris was discarded and the supernatant was transferred onto a fresh eppendorf tube. Protein extracted with lysis solution was buffer exchanged with 100 mM Triethylammonium bicarbonate (TEAB) using a 3 kDa cut off membrane filters to bring down the concentration of urea well below 0.1 M. Protein amount was quantified using the Bradford assay using 1 μg/μl of Bovine serum albumin as a standard.
Isobaric tags for relative and absolute quantitation (iTRAQ) labelling. Five sets of SLNM + tissue samples, five sets of SLNM-tissue samples and two cancer control benign breast tumor tissues were taken for iTRAQ experiment. The design of iTRAQ experiment is illustrated in Fig. 1. Each experiment composed of at least one SLNM + , one SLNM-and, one of either of these two phenotypes or a cancer control benign breast tumor tissue. The equimolar culmination of the three phenotypes was made into an internal standard for the sake of normalization for each of the four experiments 27 . 80 μg of protein from each phenotype sample was reduced with 25 mM DTT for 30 min at 60 °C and alkylated with 55 mM iodoacetamide for 20 min at room temperature. Each of these proteins samples were digested for 16 h with trypsin in 1:10 ratio at 37 °C. Digested peptides were then labelled with iTRAQ 4-plex reagents, 114 (sentinel lymph node metastasis), 115 (sentinel lymph node without metastasis, 116 (one of either of these two phenotypes or cancer control benign breast tumor tissues), and 117 (internal standard) following the protocol provided by manufacturer (AB Sciex, Foster city, CA, USA). In brief, all vials of iTRAQ labelling tags were reconstituted in 70 μl of absolute ethanol. This was then added to each sample and incubated for 2 h at room temperature, and the reaction was quenched using 50 μl mili-Q water. iTRAQ labelled samples in each experiment were then pooled into a single vial and dried using speedvac. These samples were reconstituted in 8 mM ammonium formate buffer (pH: 3) and were fractionated by cation exchange using isotope coded affinity tag cartridge. Peptides were then eluted with 500 μl of gradient elution with 5 mM to 500 mM concentrations range of ammonium formate (pH: 3) to obtain a total of eleven fractions from each experiment. These 44 fractions from the four experiments were vaccum dried and taken for mass spectrometry analysis.
Mass spectrometry data acquisition. In-house protocols were used for Mass spectrometry data acquisition 27 . Peptides from the 44 fractions were desalted and concentrated using reversed phase ZipTip, and reconstituted in 0.1% formic acid. The peptide fractions were loaded onto analytical column (Acclaim Pep-Map RSLC C18, 2 μm, 100 Å, 50 μm x 15 cm; Thermo Scientific, Rockford, USA) associated with trap column (Acclaim PepMap C18, 3 μm, 100 Å, 75 μm x 2 cm; Thermo Scientific, Rockford, USA). The peptide separation were performed using EASY-nLC 1200 which was coupled with Orbitrap Fusion Tribrid Mass Spectrometer (Thermo Scientific, Rockford, USA) for Mass spectrometry analysis. The peptide fractions were premixed in loading buffer (Mobile phase A: 100% water and 0.1% formic acid) and 1 μg were loaded on a trap column with a flow rate of 300 nl/min. The retained peptides were washed iso-cractically by loading buffer for 45 min to remove excess salt. The peptides were then resolved on an analytical column with a multi step linear gradient of loading buffer and elution buffer (mobile phase B: 80% acetonitrile and 0.1% formic acid) at a flow rate of 250 nl/min. The gradient elution were initiated using 5% elution buffer and were held for 1 min, with linear increase rate of 10% for 10 min, 35% for 70 min and 50% for 80 min. The gradient elutions were held at 80% mobile phase B for 8 min before being re-equilibrated to 5% mobile phase B for 18 min. The mass spectrometer was operated in data dependent acquisition (DDA) mode. The full MS spectra was acquired in positive ion mode in m/z ratio  31 . Both MS and MS/MS spectra were searched using Sequest HT algorithm against a combined UniProt Human proteome database appended to a list of common contaminants provided by Thermo Scientific. Sequest HT parameters were specified as trypsin enzyme, two missed cleavages allowed, minimum peptide length of 6, precursor mass tolerance of 10 ppm and a fragment mass tolerance of 0.05 Daltons. The static modification was set to carbamidomethylation (+ 57.021 Da) of cysteine. The dynamic modifications on peptide terminus were set to methionine oxidation (+ 15.995 Da) and iTRAQ 4-plex (+ 144.102 Da) modification, on N-terminus and Lysine (K) residues. Since iTRAQ modifica- www.nature.com/scientificreports/ tion was used as dynamic modification, the unmodified or unlabelled peptides and associated proteins were removed from the analysis. Also, dynamic modification was assigned for acetylation (+ 42.011 Da) of protein's  www.nature.com/scientificreports/ N-terminus. Peptide spectral match (PSM) error rates were determined using the target-decoy strategy coupled to Percolator PSM validation node to trigger the positive and false matches. In Percolator node, the false discovery rate (FDR) was calculated based on the q-values of Decoy database search. Data were filtered at the peptide spectral match-level using a strict FDR cut off of 0.01 and relaxed FDR cut off of 0.05 as determined by Percolator. Contaminant and decoy proteins were removed from all data sets prior to downstream analysis. Reporter ion values were calculated on "Reporter Ions Quantifier" node using FTMS mass analyzer setting and HCD activation process. Reporter ions were quantified from MS/MS scans using an integration tolerance of 20 ppm with the most confident centroid setting. The following settings were used to obtain the quantification results: the protein ratio type was the 'weighted' geometric mean, normalization with summed intensities and outlier removal was 'automatic' . The peptide threshold was 'at least homology' where peptide score does not exceed absolute threshold but is an outlier from the quasi-normal distribution of random scores. Minimum of two unique peptides were required to be the top ranking matches. In Consensus workflow in "Reporter Ion Quantifier" node, the following settings were applied to increase the quantification accuracy of the analyzed proteins: (i) only unique peptides were used for protein quantification, (ii) precursor co-isolation threshold was considered 25%, (iii) average reporter signal to noise ratio threshold was considered 10 and, (iv) peptide normalization was done with respect to the total peptide amount. At the level of protein analysis, further normalization was done where the protein abundance of individual sample was scaled by the abundance of the internal standard which was labelled with channel 117 of iTRAQ 4-plex reagent. On obtaining the results, multiple filter criteria were applied and only those proteins were considered for differential expression analysis which had: (i) FDR confidence threshold as medium during identification by Sequest-HT, (ii) presence of atleast two unique peptides and, (iii) peak found at found in sample. Expression fold change ratio of ≥ 1.5 and ≤ 0.66 were considered as up-regulated and down-regulated proteins. Proteins with fold change ration between 1.5 and 0.66 were considered as house-keeping proteins. A multiple students t-test was applied to the whole set of differentially expressed proteins and a volcano plot with a p-value < 0.05 was generated to graphically represent the up-regulated, down-regulated and house-keeping proteins. Those proteins that had a consistent expression pattern in atleast four of the six experiments were considered to be potential biomarker candidates to differentiate SLNM + and SLNM-. The relative ratios of protein abundances of only the up-regulated proteins was compared between metastatic group and cancer control breast tissues was used to estimate the possible tissue source.

KEGG pathway and Gene Ontology analysis.
Differentially expressed genes were imported on DAVID 32 (Database for Annotation, Visualization and Integrated Discovery, version 6.8, https:// david. ncifc rf. gov/ tools. jsp) functional annotation tool and Functional Enrichment analysis was done. Homo Sapiens was used as background species and the enrichment analysis was run for Cellular Component in Gene Ontology (GO_CC) and Kyoto Encyclopaedia of Genes and Genomes (KEGG) pathways. Only those results that had FDR adjusted p-values ≤ 0.05 were considered.
ELISA. Twelve differentialy expressed proteins: α-crystallin B chain, monoamine oxidase, caveolin-1, collagen α-1, desmin, fibrillin-1, long-chain-fatty-acid-coA ligase 1, laminin subunit α-4, heterogeneous nuclear ribonucleoprotein D, non-histone chromosomal protein, cathelicidin antimicrobial peptide, rho GDP-dissociation inhibitor 2 were chosen for validation phase of the experiment. The protein concentrations of these proteins were quantified in validation set of 13 SLNM + and 12SLNM-patients using ELISA according to the manufacturer's instructions (Bioassay Technology Laboratory, Shanghai, China). All determinations were performed in duplicates according to the manufacturer's recommendations. Differences between SLNM + and SLNM-groups of patients were calculated using independent Student t-test; values of p < 0.05 were considered significant.
Statistical analysis. Normalization of the proteins was done using MetaboAnalyst (version 5.0) software (https:// www. metab oanal yst. ca/ Metab oAnal yst/ Modul eView. xhtml) 33 using sum of protein abundances. Data transoformation using generalized logirithm and has been scaled using Pareto scaling option. Data analysis for ELISA was carried out using STATA version 16.0 version. Protein concentrations that could be used as cut-off to distinguish between metastatic state and non-metastatic state, were estimated based on Receiver Operating Characteristics (ROC) analysis of the ELISA data. The non-parameteric ROC analysis was carried out using DeLong method. Area under the curve (AUC) was obtained with 95% confidence limits. Optimum cut-off value was obtained at which Yuden Index (sensitivity + specificity-1) was maximum. Percentage correct classification and likelihood ratio values were computed. Statistical significance level P < 0.05 was adopted to test the significance of the AUC.
Ethics approval and consent to participate. This

Result
Clinical profile. Based on the inclusion and exclusion criteria a total of twenty eight patients with early breast cancer were recruited to this study. From among these, thirteen patients were those who had sentinel lymph node metastasis; thirteen patients were those who did not have sentinel lymph node metastasis, and two patients who were diagnosed with fibroadenoma were recruited to procure breast tissue that would serve as cancer controls. From among these, five with SLNM + , five with SLNM-, and the two cancer control benign breast tumor were chosen for the discovery phase of proteomic experiments by iTRAQ. The clinical profile of www.nature.com/scientificreports/ these twelve patients is provided in Table 1. All patients with early breast cancer were confirmed by mammography and sentinel lymph node tissue samples annotation either with sentinel lymph node metastasis or without metastasis were done using histopathology evaluation. The mammography images of patients who were considered for the discovery phase of the study are shown in Fig. 2. The assessment of mammogram was performed according to scores of BIRADS (Breast Imaging Reporting and Data System) classification. The hematoxylin   Fig. 1). This process has adjusted for differences among different samples, data transformation and scaling to make individual protein expressions comparable across metastatic and non-metastatic lymph node groups. A pair-wise multiple Students's t-test that incorporates p-value was used to arrive at 109 differentially expressed proteins between SLNM + and SLNM-of which 49 proteins are up-regulated 60 proteins are downregulated as shown in volcano plot (Fig. 4). The relative abundance ratios of only the up-regulated proteins were compared between metastatic group and cancer control breast tissues. Proteins such as desmin, fibrillin 1, tautubulin kinase, transgelin, calponin 1 and myosin 11 that had a ratio of more than one are ones that are native to the lymph node, and proteins such as heat shock protein 6, α-crystalline B, amine oxidase 3, caveolin 1, collagen α1, fibrinogen gamma chain, GAPDH, long chain fatty acid Co-A ligase, laminin subunit α4, membrane primary oxidase, microfibril-associated glycoprotein 4, perilipin 1 and redox regulatory protein that had a ratio of less than one are derived from breast tissue. Based on the functional annotation and their relevance in this study few of these proteins are discussed in Tables 2 and 3.
Validation by bioinformatic analysis and ELISA. KEGG pathway analysis highlights the role of the differentially expressed proteins in various pathways (Fig. 5a). The most interesting feature is the ECM-receptor interaction that implicates seven proteins that include various isoforms of collagen-α. Apart from ECM-receptor interaction pathways, focal adhesion and PI3K-Akt signalling feature prominently in the pathway analysis. Gene www.nature.com/scientificreports/ Ontology for Cellular Component (GO_CC) enrichment that was carried for the 49 upregulated proteins shows that majority of the proteins belong to extra cellular component (Fig. 5b). The next component with highest number of proteins was focal adhesion component. Focal adhesions are large macromolecular assemblies through which mechanical force and regulatory signals are transmitted between the extracellular matrix and an interacting cell. Implication of extracellular proteins is therefore quite evident in breast cancer metastasis. Gene Ontology for Cellular Component (GO_CC) enrichment that was carried for the 61 down-regulated proteins shows majority of the proteins being confined to cytoplasmic, cytosol and membrane component (Fig. 5c). Based on the results of bioinformatic studies, ELISA was performed on upregulated extracellular proteins in SLNM + . Caveolin 1, Desmin, Microfibrillar associated glycoprotein,Collagen α4 and Fibrillin 1 were confirmed to be elevated in SLNM + as compared to SLNM- (Fig. 6) (Supplementary Table 1). These proteins have a minimum of two fold higher expression in SLNM + as compared to SLNM-.

Discussion
The predominant age group of early breast cancer patient recruited in our study was 36-59 years. Most of the patients presented with lump, nipple discharge, nipple retraction, pain in one part of the breast. Progesterone, esterogen and HER-2 status did not have any correlation with the lymph node status of early breast cancer. Patients clinically diagnosed as early breast cancer were subjected for mammography screening to confirm the type of breast cancer. Twenty five patients were diagnosed with early breast cancer on the basis of their characteristic features like abnormal masses and collection of calcification. As normal control tissues could not be procured due to ethical concerns, Two fibroadenoma tissues that were closest representations of 'cancer control' breast tissues were used to know the source of the protein. Methylene blue was used to map the sentinel lymph node in axilla region of breast, followed by its excision. Histopathological analysis was done to annotate the tissue phenotype either SLNM + or SLNM-or benign tumor. Proteins from sentinel lymph node tissues were isolated, quantified, digested with trypsin and labelled with different isobaric tags combined into one sample mixture for identification and quantification by LC-MS/MS analysis. Isobaric tags for relative and absolute quantitation (iTRAQ) technology relies on the quantitation of low molecular mass reporter ion groups released from isobaric tags that are covalently bind to primary amines of tryptic peptides that need to be quantified via amine labelling. The final experiment so designed was to enable: (i) six phenotypic protein profile comparisons, (ii) intra-experimental normalization and (iii) understand the cellular source of the protein. While the Up-regulated proteins in SLNM + , are related to tumorogenesis, cell proliferation, motility, cell survival, progression and anti-apoptosis, the up-regulated proteins in SLNM-are involved in cell motililty suppression and influenzing decreased cell growth. Expression of five proteins caveolin-1, collagen α1, desmin, fibrillin-1, and microfibrillar associated glycoprotein 4 were validated and were found to be consistent with the discovery phase results. The functions of these proteins in the context of understanding sentinel lymph node metastasis are: (a) Caveolin-1 (Cav-1), a 22 kDa small oligomeric scaffolding protein encoded by CAV1 gene is a major structural protein of membranes called caveolae and plays very crucial role in many cellular processes, including endocytosis, receptor internalization, ECM organization, lipid transport, signal transduction 86,87 ; (b) Collagen αtype1 (COL1A1), a 138 kDa protein encoded by the COL1A1 gene is a most abundant protein of extracellular matrix forms a characteristic triple helix structure of three polypeptide chains, and contributes to the integrity, elasticity and strength of body's connective tissues, entrapment, local storage and delivery of growth factors and cytokines and therefore plays an important role during organ development, wound healing and tissue repair 88,89 ; (c) Desmin (DES), a 53 kDa protein encoded by DES gene is a muscle-specific protein and  www.nature.com/scientificreports/ a key subunit of the intermediate filament in cardiac, skeletal and smooth muscles, and plays an essential role in maintaining extracellular matrix interactions, cytoarchitecture, structural integrity and function of muscles by forming three dimensional scaffold across sarcomeres of smooth muscles 90,91 ; (d) Fibrillin-1 (FBN1), a 312 kDa protein encoded by FBN1 gene, is a large cysteine rich glycoprotein produced by fibroblasts, and is the principal structural component of extracellular matrix forming microfibrils in the connective-tissue, and Interacting with other components of the extracellular matrix (ECM), this ubiquitous glycoprotein exert pivotal roles in tissue development, homeostasis and repair. In addition to mechanical support, FBN networks also exhibit regulatory activities on growth factor signalling, ECM formation, cell behaviour and the immune response 92,93 ; and (e) Microfibrillar-associated glycoprotein 4 (MFAP4), a 36 kDa protein encoded by MFAP4 gene is an extracellular matrix protein that plays major role in elastin fiber formation and is associated with ECM remodeling processes during vascular injury, and interacts with other ECM proteins such as FBN1 that provides cell adhesion, intercellular interactions and the assembly and/or maintenance of elastic fibres 94,95 . Associates with TGFβ/Smad-3 Promotes invasion [71][72][73] Regulates the production of MMP2 Promotes metastasis 72,73 Table 2. Proteins up-regulated in tissue of sentinel lymph node with metastasis as compared to sentinel lymph node without metastasis. www.nature.com/scientificreports/ Label based differential proteomic experiments, Pathway analysis, Gene Ontology studies, and ELISA experiments clearly establish the role of extracellular matrix proteins in sentinel lymph node metastasis. Extracellular matrices (ECMs) are highly specialized and dynamic three-dimensional (3D) scaffolds into which cells reside in tissues and its principal components are collagens, glycoproteins, and proteoglycans 96 . Upon physiological and pathological triggers, ECM-degrading enzymes, matrikines, are released to remodel the ECM, to re-establish an appropriate functional meshwork and maintain cellular homeostasis 97,98 . But in metastasis, ECM remodeling is hijacked and there is perturbation and degradation in ECM architecture by matrix metalloproteinases 99,100 . Due to ECM degradation there is loss of endothelial integrity allowing cancer cells to escape from primary tumor to other tissues including lymph nodes 101 . During this process of metastasis, cancer cells undergo epithelial-to-mesenchymal transition (EMT), which can be induced by increased deposition of ECM proteins 102 . This action alters the phenotypic properties of cells and affects their propensity to escape primary tumor and cause metastasis 103 . In addition, an increased regulation of ECM proteins through recruitment and activation of cancer-associated fibroblasts (CAF) results in activation of biophysical and biochemical oncogenic signalling pathways 104 . The oncogenic signalling pathways of the identified extra cellular matrix proteins are (a) caveolin-1: PI3K/AKT and Ras/Raf signaling through the ERK; (b) Desmin: PI3K/AKT through caspase; (c) microfibril associated glycoprotein 4: ERK/MMP signalling through FAK and c-Jun; (d) collagen α-1: FAK signalling through PI3K/AKT and MAPK/ERK, and (e) fibrillin 1: SMAD2/3/4 and MEK pathway through ERK; which induces cell proliferation, survival, motility, angiogenesis, hypoxia, cancer stem cell activity, epithelial to mesenchymal transition and eventually lymph node metastasis [105][106][107][108][109] . Caveolin-1, Desmin and Collagen α-1 which mediate their functions through PI3K/Akt signalling which is represented along side ECM-receptor interaction in the pathway analysis. The overview of this detailed analysis is pictorially represented in Fig. 7. Therefore, it is this interplay between the up-regulated extra cellular matrix proteins, active growth factors of cancer cells, fibroblasts and signalling pathways, which together promote lymph node metastasis.
Caveolin-1, Desmin, microfibril associated glycoprotein 4, fibrillin 1 and collagenα-1 have been identified as potential biomarkers that can discriminate metastatic from non-metastatic sentinel lymph nodes in early breast cancer. To understand their ability to differentiate the two clinical phenotypes AUCs were plotted based on the ELISA concentrations ( Supplementary Fig. 2). The areas were estimated to be 0.81 and 1.0 for these five proteins. Diagnostic parameters of sensitivity, specificity, positive predictive value and Negative predictive values are over 80% for each making them fairly accurate as a translational tool (Table 4). Caveolin-1 and Desmin with cut-off  www.nature.com/scientificreports/ values of 17.4 and 28.5 ng/μg of tissue protein respectively, in particular are promising candidates with 100% values for all the diagnostic parameters. Since both the sensitivity and specificity measures are independent of prevalence rate, the practical utility of these two markers need to be validated among the population with varying prevalence rates.

Conclusion
iTRAQ based proteomic experiment is an ideal platform for comparative protein profiling in identification of potential biomarker candidates for sentinel lymph node metastasis in early breast cancer. Extra cellular matrix proteins caveolin-1, collagen α-1, desmin, fibrillin-1, and microfibrillar associated glycoprotein 4, have been identified as potential biomarkers that can differentiate the two metastatic states of sentinel lymph nodes. Each of these by themselves or as a collective panel offers translational scope for the design of 'on-table' diagnostics to flag sentinel lymph node metastasis in early breast cancer.

Data availability
The mass spectrometry proteomics data have been deposited to the ProteomeXchange 110 Consortium via the PRIDE partner repository with the dataset identifier PXD027668.  www.nature.com/scientificreports/